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A dynamic decision-making system that includes a mass of in- 
distinguishable agents could manifest impressive heterogeneity. This 
kind of nonhomogeneity is postulated to result from macroscopic be- 
havioral tactics employed by almost all involved agents. A State- 
Space Based (SSB) mass event-history model is developed here to 
explore the potential existence of such macroscopic behaviors. By im- 
posing an unobserved internal state-space variable into the system, 
each individual's event-history is made into a composition of a com- 
mon state duration and an individual specific time to action. With the 
common state modeling of the macroscopic behavior, parametric sta- 
tistical inferences are derived under the current-status data structure 
and conditional independence assumptions. Identifiability and com- 
putation related problems are also addressed. From the dynamic per- 
spectives of system-wise heterogeneity, this SSB mass event-history 
model is shown to be very distinct from a random effect model via 
the Principle Component Analysis (PCA) in a numerical experiment. 
Real data showing the mass invasion by two species of parasitic ne- 
matode into two species of host larvae are also analyzed. The analysis 
results not only are found coherent in the context of the biology of 
the nematode as a parasite, but also include new quantitative inter- 
pretations. 

1. Introduction. Consider a dynamic decision-making system consisting 
of many indistinguishable biological organisms, or agents, within a closed 
environment. Typical examples in biology and ecology include cases of a 
large fixed number of animals foraging in a common patch, many insect 
parasites invading a target host, etc. In such dynamic systems, one partic- 
ularly interesting and also very frequently encountered phenomenon is the 
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dramatic heterogeneity among the systems from a sample of presumably 
identical systems. 

The presence of the heterogeneity among systems seems puzzling, as the 
mass of agents supposedly behave in an unsupervised fashion, and the mas- 
siveness of agents, that is, the large number of agents per system, should 
drive all systems into some sort of homogeneity. On the contrary, great 
heterogeneities among systems are often observed. One way to untangle 
this puzzle is to think of the heterogeneity as a manifestation of some 
self-organized macroscopic behavioral patterns. In this paper we consider 
a scenario, explained in Section 2, that gives rise to a particular type of 
self-organized macroscopic behavioral pattern. 

Although all agents are nearly identical morphologically and very similar 
in many key biological constructs that influence the particular decision of 
interest, the large number of agents could accommodate a distribution of 
such constructs with a sizeable range. That is, every system could contain 
individuals that represent both upper and lower extremes that are very far 
apart from the majority. They are generically called extremists [Crossan, 
Paterson and Fenton (2007)] or potential leaders [Rands et al. (2003)]. 

We postulate that the system indeed needs to be ignited by the extrem- 
ists, after which the remaining majority of followers could quickly perform 
the event of interest. In this fashion the heterogeneity among systems will 
be observed due to behavioral differences of extremists in each experimental 
system. This between-system heterogeneity is then taken as a macroscopic 
behavioral pattern because early emergence of extremists will give rise to 
crowding events much sooner than a system having late disclosure on a 
relevant temporal scale. That is, in general, between-system heterogeneity 
can be potentially caused by differences in relatively small extreme compo- 
nents within a system that involves a mass of agents. We contend that an 
accurate depiction of between-system heterogeneity will prove fundamen- 
tal to understanding the mechanism of a dynamic decision-making system, 
especially when considering underlying components of extreme nature. In 
order to successfully extract such information, a new way of modeling this 
between-system heterogeneity is required, since random effect models per 
se are mechanistically and philosophically less fit to describe the scenario 
considered above. 

Here we address heterogeneity modeling by imposing an internal state- 
space structure into the dynamic system when vital configuration informa- 
tion about the mass of agents under study is completely missing. Instru- 
mentally all individuals' decisions are correlated because they all share a 
common system state. For expositional simplicity, we consider a rather sim- 
ple internal state-space variable that has only two states. Each system sets 
off with the same first state, and then switches into the other state with- 
out recurrence. The first state is termed as the "impermissible state" in 
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which the particular action of interest is unlikely to occur for the majority 
of agents. Only after this state-space variable changes from the impermis- 
sible state into the "permissible state" can a decision leading to an event 
possibly occur. The duration of the impermissible state is unknown. 

Though the dynamics of this state variable are seemingly strict and sim- 
ple, many biological systems can be described accurately using this con- 
struct. One example is the infective juvenile (IJ) of parasitic nematodes 
invading a host, which motivated our study. The biology of this system is 
discussed in detail in the next section. We will show that such a state-space 
variable could physically exist and be reasonably established in an unsuper- 
vised fashion. 

Based on the above structural assumption of the internal state-space vari- 
able, each individual's event-history becomes a composition of a common 
duration of the impermissible state, shared by all members of the mass of 
agents contained in the closed system, and an individual specific time to 
action within the permissible state. We will refer to this composition as the 
State-Space Based (SSB) mass event-history model. Within this model, the 
common random impermissible state duration variable is intuitively thought 
of as the time duration needed for the small group of extremists to work out 
their pioneering actions which result in the vital signals that are then de- 
tected by the rest of the agents. This common random time duration is 
the source of macroscopic correlation. Furthermore, we assume that given 
the duration of the impermissible state, random variables of individuals' 
times to action within the permissible state are independent. This condi- 
tional independence construct is the foundation for the statistical inferences 
proposed and developed here. Its major goal is to decide whether a sample 
of dynamic decision-making systems really involves a state-space structural 
heterogeneity. 

To achieve our goal, the statistical inference needs to accommodate several 
inherent data structures. In a study involving a mass of indistinguishable 
organisms, two difficulties in data collection are often encountered: first, 
a single individual may be too difficult to be reliably marked and directly 
observed due to smallness or the lack of proper technology; second, any mea- 
surement requires sacrificing the system in one way or the other. In other 
words, the system has to be terminated at the time when a measurement 
is taken. The second difficulty severely limits the researchers having only 
one measurement per system, while the first structure only allows one dis- 
crete count at any time point. To accommodate these two data situations, 
we study only parametric inferences here. The particular parametric ver- 
sion of SSB mass event-history model considered here is a composition of 
the Weibull model for the impermissible state duration and Logistic model 
for time to action under the permissible state. Potential extensions of this 
parametric version are briefly discussed in the Discussion section. 
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To further enhance our understanding from dynamic perspectives, sev- 
eral distinctions between the SSB mass event-history model and Logistic 
model with random effects are compiled through a numerical experiment. 
Via Principle Component Analysis (PC A), great differences in their spectral 
structures are revealed. Via cross-sectional distribution comparison along 
the time axis, great differences in variability of mass event-history are also 
manifested. 

This paper is organized as follows. We briefly describe the biology of 
nematode invasion in Section 2 as the basis of our model structure. The 
parametric SSB mass event-history model and its corresponding likelihood 
function developments are discussed in Section 3. Then statistical inferences 
and the accompanying identifiability and computations are addressed in 
Section 4. In Section 5 two numerical experiments are conducted: one is to 
compare the mass event-history model with the random effect model, and 
the other is a simulation study of the model proposed here. In Section 6 
four real data sets of nematode invasion are analyzed. Other related issues, 
including model extension, are addressed in the discussion section. 

2. Motivating example: Mass of nematode invasion. Entomopathogenic 
nematodes (EPN) in the genus Steinernema are soil-dwelling obligate par- 
asites of insects. The infective juvenile stage (IJ) is the only life stage that 
lives outside the host and its function is to find, assess and finally infect a 
suitable host [Lewis et al. (2006)]. During the IJ stage, the nematodes are 
arrested in development with no eating, growth, mating or reproduction; all 
of these functions take place inside the host. Within hours of entering the 
host, the IJ nematodes release symbiotic bacteria that kill the host by sep- 
ticemia and toxemia within a few days. The nematodes develop into adults, 
mate and produce up to 3 generations inside a single host over the course 
of about two weeks, when the nutritional value of the host begins to decline 
and the next cohort of Us is produced and leaves the host. Missing in this 
description of the life cycle is the importance of a time frame for infection. 

Ten to hundreds of Us infect a single host, so the first few must lead the 
invasion and the remaining of the majority follow. This spontaneous emer- 
gence of leaders and followers is generally predicted through a dynamic-game 
of the foraging group [Rands et al. (2003)]. Among the invading herd of Us, 
there is risk associated with being the first to invade for two reasons; first, 
the host can mount an immune response to kill the invading nematode [Li, 
Cowles and Cowles (2007), Wang and Ganger (1994)] and second, if a single 
IJ invades and no others follow, mating and successful reproduction cannot 
take place. There are also risks to invading the host late in the infection 
generally associated with the declining quality of the nutritive value of the 
resource. An insect host undergoing infection by entomopathogenic nema- 
todes is a resource with rapidly changing quality and indications thereof. 
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Thus, the information on which Us base their decisions (e.g. chemical cues 
produced by the infection) is dynamic. This is evidenced by the observa- 
tion that Us' invasion behavior is not the same toward infected and healthy 
hosts [Kunkel et al. (2006); Grewal, Lewis and Gangler (1996); Christen et 
al. (2007)]. To avoid the risks associated with being the first to invade a 
healthy host, we would hypothesize that all Us should collectively wait for 
another to invade, but once the infection is underway, invasion should be 
permissible for all and proceed rapidly to avoid the risks associated with an 
old infection. 

This emergence of leader-follower behavior among a mass of Us would 
collectively result in a lead time before most invasions take place that would 
be shared by all Us in the vicinity of the host (or in an individual system). 
That is, for most Us, an individuals's event time to invading a host is the 
lead time, or duration of the impermissible state for invasion, which is a 
random variable in the nematode example, plus a random time to initiate 
the action after perceiving signals indicating that the infection has begun. 
Perception of these signals implies the termination of the lead time and the 
beginning of the permissible state for invasions. Hence, the collection of all 
Us' event times to invasion are indeed related by the sharing of a common 
lead time variable. Behaviorally speaking, this compositional event time can 
explain the IJ's "wait-and-see" invasion strategy. 

Available technology does not allow measuring an individual IJ's invasion 
event time because they are less than 1mm in length and live in the soil and 
are thus too small to be reliably observed or marked. Our measurements 
of parasite infection patterns were conducted with a large number of Us 
(300) and a single host contained in a 15 ml centrifuge tube with 2 ml of 
sand at the bottom. To estimate the number of Us that invaded a host, we 
exposed hosts in this manner for specified durations, then extracted those Us 
remaining in the sand by floating them from the sand in water [for detailed 
description of experimental methods, see Christen et al. (2007) and Lewis 
et al., unpublished data]. The experimental system is sacrificed at the time 
of collection; only one measurement per experiment is possible. 

A typical data set consists of a collection of counts from the Us' dichoto- 
mous invasion status (invaded or not) from a sample of systems sacrificed 
at several designated time points with replication. As analyzed later in Sec- 
tion 6, the particular evidence sought here is the significant heterogeneities 
observed among invasion counts at a time point, especially among counts 
from experiments in which Us have a relatively short exposure duration to 
a host (less than 12 hours). The reason behind analyzing heterogeneity is 
that values of the lead time variable in different experimental runs should 
vary to a great degree due to the involvement of a mass of Us along with 
the randomly distributed values for this variable. In nature, the mass of Us 
could consist of hundreds to thousands of individual Us. Once an infection 
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has begun, many Us are likely to follow quickly, resulting in large counts of 
invaded Us. This scenario causes the heterogeneity observed in the experi- 
ment described. Table 1 below shows a complete real data set and reveals 
the typical heterogeneity in invading event counts. 

To our knowledge, there is no published study that describes such a com- 
positional structure of invading event time of parasites. Confirmation of such 
a structure of event times would suggest that individual Us use information 
of host infection status to make decisions about invasion. Our goal in this 
paper is to establish a compositional structure that describes the pattern of 
invasion decisions of infective stage parasites. Such a model will be useful 
in describing and comparing the decision-making processes of animals when 
observation of individuals is impractical and the collection of data is destruc- 
tive to experimental setups. It will also establish a theoretical framework for 
asking more sophisticated questions about how parasites find, assess and 
infect their hosts. 

We postulate that if the distribution of lead times has a mode not equal 
to zero, that is, being distinct from the Exponential distribution, then a 
positive lead time component in IJ invasion event time is established. A mass 
event-history model is developed for extracting the lead time distribution 
information in the next section. 

3. State-space based mass event-history model. Let Q,{oj m ) denote a 
closed system (f2) containing a mass of M agents (w's) and a single target 
host. A closed system is defined as having no agents transferring in or out 
of this system. For the system as a whole, macroscopically, denotes the 
state-space variable that takes the value "0" for being in the impermissible 
state and "1" for being in the permissible state at any time point. In the case 

Table 1 

First 48 hours of exposure: Steinernema feltiae infecting Galleria mellonella. Each 
number represents the number of Us that invaded a single host at the indicated exposure 
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when each individual agent is equipped with a microscopic time-independent 
potential or preference phase variable, then each u is a Bernoulli random 
variable taking symbolic value "+" as in the phase of being able to invade the 
target host, while symbolic value "— " indicates being out of the action phase. 
It is noted again that only an agent with u = + phase could successfully take 
action upon the target host under the state-space O = 1 . 

Further, denote U as the random time duration for $7 = 0, which is not 
directly observable. Since the system's state status is revealed at any time 
point when the system is sacrificed, U, in fact, is observable with its current- 
status, not its value. Within the system Q(lo m ), hypothetical^ each agent 
would give rise to an event time T from time origin to the moment its action 
is successful upon the target host. Denote the collection of event times as 
{Tm}m=i- Below we prescribe the SSB mass event-history model on the 
system Q,(u> M ). 

SSB mass event-history model: Each event time T m has a compositional 
form as T m = U + S m , with S m being individual specific time to successful 
action in the permissible state f2 = 1. For an agent in the uj m = — phase, 
S m = oo. Next, assume the conditional independency for the collection of 
\Tm}m=l' 

[Conditional-independence] Given U , T m is conditionally independent ofT m > 
for all m^m' . Under the SSB mass event-history model setting, we further 
consider parametric distributions for both U and conditional random vari- 
able T m \U,u> as follows: 

Al. Impermissible state duration U is distributed according to the Weibull 

distribution, denoted by Weibull(X, 7); 
A2. The conditional survival function of T m given U,lu is logistic, that is, 

for all t> u, 

(3.1) Pr[T m >t\U = u,u> = +} = l + J +p{t _ uy 

It is known that originally the Weibull distribution was derived as a dis- 
tribution of extreme events from a system consisting of many components 
in a reliability context. The logistic regression model assumption for sur- 
vival times was discussed in Efron (1988) and is practical and typical for 
count data. The mass of agents has variable potential phases satisfying the 
following: 

A3. If agent's potential phase u is a Bernoulli random variable with 

(3.2) p r [uj = +]=r 1 , 

then this model setting is denoted by the SSB + mass event-history 
model. And the SSB mass event-history model that we will use in Sec- 
tion 4s and 5 is essentially a sub- model with Pr\uj = +] = 77 = 1. 
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Here we develop the likelihood function under the SSB + mass event- 
history model setting. Let N(t) denote an event count from a system Sl(u) M ) 
at time t. The conditional probabilities for positive count N(t)(> 0) are: 



(3.3) 



Pr[N(t)\U = u] 



c(t) 



rye 



a+P(t-u) 



I _|_ e a+f3(t-u) 



N(t) 



V 



I _|_ e a+/3(t-u) 



+ 1-1] 



M —N(t) 



where c(t) = (^ } ). 

Let = (a, (3, A, 7, 77)'. Under the model assumptions Al, A2 and A3, the 
amount of likelihood contributed by one event count N(t)(> 0) is calculated 
as 

'* Pr[N(t)\U = u; a, (3, V ]fu(U = u; A, 7) du; 



L(6\N(t)) 
(3.4) 



c(i) 



a+0(t-u) -1 



1 _|_ e a+f3(t-u) 
V 



N(t) 



M-N(t) 



,+1-ri 7 A~ V-V^ 7 du. 

I _|_ e a+/3(t-u) 'J ' 

As for zero count N(t) = 0, the amount of likelihood contributed is equal to 
L(6\N(t)(=0)) 

(3.5) 



t r 



V 



1 _|_ e a+l3(t-u) 



+ 1-77 



i\7 



7 A-V- 1 e- (u/A)7 d-u. 

Suppose that there is a sample of systems {f2jj(u; M ), i = 1, . . . , J; j = 1, . . . , J}. 
Correspondingly, they are sacrificed at time points i = (tji, . . . with J 
replications, and result in a sample of counts Af(t) = {iVjj(rjj)}. Then the 
likelihood function based on data M(t) is computed as 



(3.6) 



/ J 

L(W))=nn L (Wi(**))> 

i=ij=i 



where L(8\Nij(ti)) is based on (3.4) and (3.5). 

Statistical inferences based on L(9\J\f(t)) will be developed in the next 
section. In advance, it is noted that, due to the large value of M, the amount 
of information for parameters a,f3 and rj will be significantly larger than 
that for A, 7. This feature becomes a characteristic for the SSB/SSB" 1 " mass 
event-history model setting, since it induces a way to simplify and stabilize 
maximum likelihood computations involved with numerical integration, as 
well as high dimensional maximization. 
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4. Statistical inferences and computations. In this section we first ad- 
dress the identifiability and then proceed to discuss the MLE computations 
for statistical inference of the SSB mass event-history model. For simplicity, 
we focus on our discussion on the setting of the SSB, not the SSB + , mass 
event-history model. 

4.1. Identifiability and information content issues. Since U is not observ- 
able in the compositional structure T m = U + S m , the issues of identifiability 
and information contents of 9 = (a, (3, A, 7) under the SSB mass event-history 
model are not entirely obvious and need clarification. The marginal distribu- 
tion of N(t) computed as Pr[N(t)\9] = L(9\N(t)) is specified for any given 
time point t. This distribution contains the following factor: 



which theoretically and practically plays an important role in deciding the 
amount of information content and sheds light on the identifiability issue as 
well. Its presence is found through the following two equations: 



It is clear that if 6 and M together in (4.1) make A(t\9) very small 
and ignorable relative to the Weibull survival probability Pr[U > i|A,7] 
at some time points ti, then from (4.2) and (4.3), for all U € t, we have 
Pr[U > tj| A, 7] « Pr[N(U) = 0\9]. Thus, the parameters (A, 7) in the Weibull 
distribution of U could be extracted with good precision, and so can logistic 
parameters (a, /?)'. Empirical evidence indicates this is indeed the case when 
the replicated iV(tj) observed at one time point U are highly heterogeneous 
in the fashion that some systems have rather large numbers of individuals, 
but some are zeros. This evidence requires that a is not too far from zero in 
negative value. On the other hand, if a is far below zero, the factor A(t\9) 
can not be too small relative to Pr[U > i | A, 7-] for most of Vs. Hence, zero 
counts should be homogeneously seen among replications. Uniform counts 
of zero also imply very little information content toward 9. 

The above two empirical phenomena, great heterogeneity vs. complete 
homogeneity, in zero counts constitute evidence borne from the fact that 
the distribution of U does not mingle with the Logistic distribution, since 
the latter is in a location-scale family and the former is not. This is the 
intuition bearing the identifiability issue. 

For analytical argument on the identifiability issue, we rewrite the marginal 
distribution into the following integral form: 



(4.1) 




(4.2) 
(4.3) 




Pr [U < t\X, 7] = Pr[N(t) > 0\9] + A(t|0), 



(4.4) 
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where 

(4.5) G k (t,u\6) 



a+j3(t-u) 



1 _|_ e a+P(t~u) 



A- r 



1 



1 _|_ e a+P{t-u) 



M-k 



0. 



if u < t; 
if u > t, 



with k = 0,...,M. 

When (3 = 0, the parameters a and (A, 7)' are completely separated within 
the expression of Pr[N(t) = k\0]. Therefore, it is known that we need at least 
two time points, that is, 7 > 2, to identify of (A, 7)'. With 7 > 2, the above set 
of integral equations defined through the collection of bounded and linearly 
independent functions {Gk(t, u\9)}ILq would ensure the identifiability of our 
SSB mass event-history model. In other words, the equality of marginal 
distributions Pr[N(U)\9] = Pr[N(ti)\6*] should imply the equality of 9 = 9*. 



4.2. Computations for MLE. For maximum likelihood estimation (MLE) 
computations we propose to directly maximize the likelihood function de- 
rived in the previous section. However two kinds of computational difficulties 
face us. First, with the state-space structure imposed with the [Conditional- 
Independence] assumption, the likelihood function, L(9\M(t)), derived in 
(3.4)-(3.6) (with n = 1 for the SSB mass event-history model specifically) 
involves one-dimensional integration in each of its components. Numerical 
integration errors resulting from component-wise approximations could sum 
up to reach a nonignorable level. It is this difficulty that restricts us from 
employing the EM- algorithm, since the integration error would consequently 
cause the iteration trajectories in this EM-algorithm to fall into an oscillat- 
ing phase without converging to a fixed value. 

Second, the aforementioned significant difference in information contents 
between (a, (5) and (A, 7) likely causes the instability of inverting the Hessian 
matrix within the maximization for the 4-dimensional parameter 9 via the 
Newton-Raphson method. For these two difficulties, the grid search method 
is recommended to robustly compute the MLE of 9, denoted as 9. 

Furthermore, it is interesting and important to make use of unevenness of 
information contents by carrying out a profiled likelihood type of optimiza- 
tion via grid search. We suggest the following procedure for optimization: 

Opl. First, an initial estimate of Weibull parameters (A, 7)' could be cal- 
culated based on current status data: zero counts of N(ti) give rise 
to a right-censored duration of the impermissible state, while positive 
counts give rise to left-censored data. Denote this initial estimate as 
(Ao,7o)'. 

Op2. By plugging (Ao,7o)' into the full likelihood function L(6\Af(t)), the 
grid search is performed for an initial estimate of the Logistic param- 
eters a, p. 
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Op3. Via the profiled likelihood, we iteratively estimate (A, 7)' and (a,f3,r])' 
once or twice more. 

The reason we need only iterate once or twice is that a,j3 can be very well 
estimated even in the initial estimation. 

With the estimate 6 that results from the above iterative procedure, we 
then proceed to compute the observed Fisher information matrix i{6) based 
on the log-likelihood function l{9\N{t)) = log L (6 '\J\f(t)), as given in the Ap- 
pendix [Fushing et al. (2008)]. This observed Fisher information matrix i(0) 
could be used for interval estimation purposes. 



5. Simulation: dynamic differences between the mass event-history and 
the random effect model. The random effect model per se is the most com- 
monly used methodology to accommodate observed heterogeneity in real 
data. Often it is used by assuming individual differences following a multi- 
variate Normal distribution as the cause of observed nonhomogeneity. This 
thinking is not universally applicable, because sometimes, if not most of 
the time, the observed heterogeneity is inherent and mechanistic. Success- 
fully modeling such mechanistic heterogeneity would advance our scientific 
understanding and provide new platforms for future new discoveries. Thus, 
from such perspective, it is of great importance for scientists to be able to 
discern individual differences from mechanistic heterogeneity, and further, 
to capture the underlying mechanism properly. In this section we explain 
this discernment. 

One random effect model applicable to our problem setting is the Logistic 
regression model with a probability of failure given in (3.1), and parameters 
a, (3 are assumed to be random. So the likelihood is calculated as follows: 



L RE (6 RE \M(t)) 

(5.1) =f[{[c(t) 

i=ij=\ 



rje 



a+/3(t-u) 



I _|_ e a+P(t-u) 



N(t) 



V 

1 _|_ e a+p(t-u) 



+ 1-1] 



M—N(t) 



f(a,P\8 RE )dad(3 



Throughout this section we take rj = 1 for expositional simplicity. 

In the first part of this section a computer experiment is devised and 
performed to characterize the dynamic differences between the SSB mass 
event-history and the random effect model in generating the mass event 
count trajectory from individual systems. In the second part another simu- 
lation study is conducted to evaluate the performance of profiled likelihood 
computations under the SSB mass event-history model. 
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5.1. Computer experiment for dynamics of SSB mass event-history model. 
The protocol for the computer experiment is as follows: 

1. Under the SSB mass event-history model with a chosen vector value of 
9q = (oto, (3o, Ao,7o)' = (—3,0.15,4,1.5)' and the number of agents M = 
300, there are 100 independent mass event count trajectories simulated 
with each replication governed by the following experimental design: 

(a) In the hth experiment, h = 1, . . . , 100, one random lead-time Uh 
is generated from the Weibull distribution with parameter (Ao,7o)', and 
300 random follow-up survival times from the Logistic distribution with 
parameter (ao,/3o)', denoted as {Sh,m}m=l- 

(b) For the complete data set {Th )m = Uh + Sh,mi h = l,..., 100; m = 
1, . . . , 300}, we count the cumulative number, Nh(r), of events falling into 
[0, t + 1) for time r = 0, 1, . . . , 60 (hrs), and denote the complete trajectory 
by N h = {N h (r)}% . 

2. To mimic the real data that will be discussed in the next section, only one 
event count is selected from each of the 100 trajectories. We set I sched- 
uled time points, and randomly divide the 100 trajectories into I groups 
with the common group size being J. With / = 10 and J = 10, we take 
the scheduled time points fa, i = 1, . . . , /} = {2, 4, 6, 8, 10, 12, 24, 36, 48, 60 
(hrs)}. Then within the jth group, the cumulative number of events 
falling into [0, U + \) is recorded as iVy(fj), for i = 1, . . . , I and j = 1, . . . , J. 
Thereby we simulated the data of mass event counts denoted as {Nij(ti)\i = 
1, . . . ,I;j = 1, . . . , J}. One simulated sample data set is shown in Table 2 
for illustration. 

3. We then fit the SSB mass event-history model to the above data {iVy (ij) \i = 
1, . . . , I;j = 1, . . . , J}, and computed the MLE 6 SSB of 9 = (a, (3, A, 7)' and 
the corresponding likelihood value L(9\J\f(t)). The MLE 9 SSB is com- 
puted following the procedure described in Section 4.2. 

4. Next we fit the Logistic regression model with Normal random effects on 
(a, (3)' assuming 

We denote the parameter vector of this random effect model as # RE = 
(Mi> ^2, P, o~i, 02)'. Compute the MLE ^ RE and the corresponding likeli- 
hood value as L RE (9 KE \M(t)). 

5. With 9 RE , 100 random samples of (oth,f3h)' are generated for h = 1, . . . , 100. 
For each pair of (ah,Ph) and M = 300, a complete logistic mass event 
count trajectory is generated and denoted by N RE (t). 



MASS EVENT-HISTORY MODEL 13 



Table 2 

A simulated sample: SSB-MEHM with (a,p,X,y)' = (3.5,0.15,4,1.5)' 



2hrs 


4hrs 


6hrs 


8hrs 


12hrs 


16hrs 


20hrs 


30hrs 


45hrs 


60hrs 


15 





26 


30 


40 


87 


104 


228 


283 


294 


18 


23 


22 


29 


58 


97 


120 


225 


287 


296 


22 


16 


16 


24 


42 


85 


132 


243 


287 


299 





16 


17 


19 


35 


110 


119 


191 


277 


300 








17 


44 


57 


83 


115 


197 


285 


300 








25 


33 


68 


42 


119 


208 


281 


295 











39 


40 


97 


132 


226 


289 


299 





13 


12 


38 


42 


55 


130 


236 


267 


300 


27 


13 


16 


20 


59 


96 


67 


137 


231 


300 


21 


16 


22 


20 


37 


80 


118 


189 


293 


299 



The steps 2, 4 and 5 in the above protocol are used to facilitate a platform 
for meaningfully comparing two generating dynamics of mass event count 

trajectory. It is observed that the log-likelihood ratio log( /^E^REn^r^ ) > 

Li (C7 \J\I [t)j 

60 in this simulated case. This difference in log-likelihood value is rather 
significant given that the number of parameters in the random effect model 
is 5, while the SSB mass event-history model involves only 4 parameters. 
That is, from either AIC or BIC model selection criteria, mass event count 
data generated from the SSB mass event-history model would be unlikely 
mistaken as being generated from the Logistic regression model with random 
effects. 

Further, from the dynamic perspectives, this computer experiment is de- 
signed to bring out the following three aspects of characteristic differences 
between the SSB mass event-history model and Logistic regression model 
with random effect: first, the longitudinal mean curve of mass event counts; 
second, the cross-sectional distribution of mass event counts; third, the per- 
centage of total variation explained by principle eigenvectors through the 
principle component analysis (PCA). 

Longitudinal mean curve: Two main features of the longitudinal mean 
curve are informative for dynamics comparison: the event onset and the 
steepest increment. As shown in Figure 1, especially for the first 10 hour 
region, the horizontal discrepancy is evident between the "the true mean 
curve" of {Nh}j^2i from the SSB mass event-history model and the mean 
curve of {iV? }i=i from the Logistic regression model with random effect. 
This difference implies that the Logistic model tends to predict event on- 
set much earlier than the SSB mass event-history model does. It is also 
observed that the steepest increment of the mass event count likely occurs 
ahead of that of the SSB mass event-history model. Ideally, confidence bands 
should be added onto the mean curves to demonstrate the variation along 
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time 

Fig. 1. Comparison of mean curves. 

the temporal axis. We refrain from doing so because the 9 curves contained 
in the resultant figure become indistinguishable and complicate the visu- 
alization of the horizontal difference comparison. As would be understood 
below and illustrated in Figure 2, the confidence band for the Logistic model 
with random effect is much narrower than the two related to the SSB mass 
event-history model. 

Cross-sectional distribution: Cross-sectional distribution comparison along 
the temporal axis offers an essential aspect in dynamics comparison. It is 
particularly informative when two dynamics give rise to very different dis- 
tribution forms, as seen in Figure 2. We perceive detailed and significant 
differences in distribution shapes at all the three considered time points. 
In the 4th hour, there is 60% of cases with no infections in the SSB mass 
event-history model. In contrast, the Logistic model predicts that all hosts 
are invaded, and accumulate event counts up to 40. By the 16th hour, the 
two distributions are centered at different locations with significant different 



up to time = 4 hour* up to time =16 hours up to time = 30 hours 




10 20 30 40 50 100 130 100 150 200 250 30O 

total number of invetions tow number of matrons total number of tnvstons 



Fig. 2. Frequency comparison of nematodes' invasion counts. 
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variations. Further, by the 30th hour, the two distribution forms continue to 
change in distinct fashions: the Logistic one becomes highly concentrated, 
while the mass event-history one still drags a heavy and long tail in the 
left-hand side. These different distribution shapes suggest that the confi- 
dence bands for the mean curves in Figure 1 might not be as meaningful as 
expected when involving only bell shape distributions. 

It is worth noting that the heterogeneity revealed in the simulated data 
{Nij(ti)\i = 1, . . . , I;j = 1, . . . , J}, or the trajectories {N^j^n can be very 
drastic. In view of the simulated data in Table 2, for example, we see a 
typical heterogeneity in the simulated vector of event counts at t i = 2 hours: 
{Nij(2)} = {15,18,22,0,0,0,0,0,27,21}. This kind of heterogeneity is very 
compatible with that shown in Table 1 of real data. This kind of hetero- 
geneity is indeed observed in three out of four real data sets analyzed in the 
next section. 

Principle component analysis (PC A): As the covariance function is a key 
feature of stochastic processes in general, the temporal covariance matrix 
provides a characteristic aspect of the dynamic mechanism. By taking each 
trajectory as a 60-dimensional vector, excluding the 0-hour, the three tem- 
poral covariance are computed based on {Nj l }j^ 1 , {N' h }^ 1 and {A^ E }^!?i, 
respectively. Here we use PCA analysis to summarize the 60 x 60 tempo- 
ral covariance matrix for comparing dynamics from the temporal variation 
perspective. 

Three curves of cumulative percentages of total variances explained by 
the principle components are plotted in Figure 3; the two curves related 
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to the SSB mass event-history models overlap each other. Indeed, Figure 3 
provides strong evidence that the Logistic regression model with random 
effect cannot capture the dynamics governed by the SSB mass event-history 
model. This conclusion is based on the following evidence. In the Logistic 
model, the first principle component interprets less than 50% of the total 
variation of its own invasion trajectories and requires up to the 7th principle 
component to reach the level achieved by the first principle component of 
the original 100 simulated invasion trajectories generated via the SSB mass 
event-history model. 

Thus, from the above three critically important aspects, we conclude that 
the SSB mass event-history model and Logistic model with random effect 
are very distinct dynamics for generating mass event count trajectories, 
or time series. Further comparison of these two dynamics through model 
selection perspective are carried out in real data analysis reported in the 
next section. Here we reiterate that scientific investigations attempting to 
accommodate heterogeneity in data should not end at a particular model 
with random effect. It is essential that models explain mechanisms beyond 
individual differences. 

5.2. Simulation study for SSB mass event-history model. A simulation 
study according to step 3 of the computer experiment in Section 5.1 is per- 
formed to evaluate the profiled likelihood computations under the SSB mass 
event-history model. That is, based on the above simulated data {Nij(ti)\i = 
= 1,..., J}, the MLE § SSB of 9 = (a, (3, A, 7)' is computed via 
the maximizing profiled likelihood iteration under the likelihood function 
L(8\Af(t)). The procedure follows Opl-Op3 described in Section 4.2. The 
results of 300 replications of MLE Q SSB estimations are summarized in Ta- 
ble 3 below. The simulation results confirm that the information content of 
a,/3 is very different from that of A, 7, and computations via the profiled 
likelihood approach work rather well. 



Table 3 

Summary of parameter estimation based on SSB-MEHM from 300 simulations 



Parameter 


True value 


Mean 


Standard deviation 


A 


4 


3.9909 


0.6235 


la 


1.5 


1.7394 


0.6445 


a 


-3 


-3.0050 


0.0992 


$ 


0.15 


0.1503 


0.0051 


A 


4 


4.6101 


0.6343 


1 


1.5 


1.6414 


0.3830 
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6. Real data analysis and its biological implications. In this section we 
analyze four data sets collected from the experimental setup described in 
Section 2 for two Us species of nematode (S. carpocapsae and S. feltiae) 
vs. two host species (G. mellonella and T. molitor). In addition to the 
background provided there, we summarize some biological information re- 
garding interactions between Us and hosts [for further details, see Lewis et 
al. (2006)]. This brief summary helps put our data analysis into context. 

Once an IJ invades a host, it resumes development which makes the de- 
cision to invade the host irreversible. Consequently, the decision directly 
influences its own fitness plus the environment to be experienced by its 
offspring. The foraging behaviors of Us of various species of EPNs differ sig- 
nificantly [Lewis et al. (2006)]: S. carpocapsae ambushes hosts by standing 
on its tail waiting for a passing host; in contrast, S. feltiae Us move through 
the soil searching for a potential host. The two hosts also differ with respect 
to their acceptability to each EPN species; G. mellonella is the preferred 
host by both nematode species [Lewis et al. (1996)]. Further, the degrees 
of interactions between these two IJ species and two host species are not 
at all uniform: S. carpocapsae has poorer performance than S. feltiae in T. 
molitor, but has better performance in G. mellonella. 

Four data sets corresponding to four combinations of IJ and host species 
with exposure time durations (2, 4, 6, 8, 12, 18, 24, 48)(hrs) are presented 
here. Interestingly, three of the four data sets contain very heterogeneous 
mass event count data similar to the data generated from the SBB model 
in Table 2, the exception being S. carpocapsae with the host G. mellonella. 
These four data sets are individually analyzed based on 5 statistical models: 
(1) Logistic regression model with fixed effect (LRM); (2) Logistic regression 
model with fixed effect and agent's infectivity phase (LRM + ); (3) Logistic 
regression model with random effect (LRM RE ); (4) SSB mass event-history 
model (SSB-MEHM); (5) and SSB+ mass event-history model (SSB+-MEHM) 

Consider the Logistic regression model with fixed effect (LRM) as the 
baseline model. We compare among the five models via the application of 
Schwarz' (1978) information criterion (BIC). In this application, differences 
of BIC criterion values between four models and LRM are computed through 
the formula (-2)[Z(0 mode i|./V(i)) - l(9hRU^(i))] + (p - 2) x log(iV), as re- 
ported in Table 4, where p = dim(0 mo( j e i) is the parameter dimension in 
a "model" among the four models other than LRM (2 = dim(#LRM))) and 
JV(t) = N is the sample size. 

Based on the BIC criterion, the SSB + mass event-history model (SSB + - 
MEHM) is the model choice for three settings: S. feltiae in two hosts species 
G. mellonella and T. molitor and S. carpocapsae in T. molitor. In one set- 
ting, S. carpocapsae in G. mellonella, the BIC selects the Logistic regression 
model with random effect (LRM RE ). These results agree with the real data 
set regarding the presence of heterogeneity in mass event counts like that 
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Table 4 

Differences of BIC criterion values from LRM 



Host 

Infective Juvenile 


G. mellonella 


T. molitor 




S. carpocapsae 


S. feltiae 


S. carpocapsae 


S. feltiae 


LRM+ 


-19.98 


-2727.86 


-1724.24 


-1050.10 


LRM RE 


-555.96 


-4477.60 


-4418.74 


-2380.30 


SSB-MEHM 


-471.98 


-4525.74 


-4500.50 


-2422.20 


SSB+-MEHM 


-459.96 


-4551.60 


-4552.74 


-2446.30 



shown in Table 2. Thus, we can statistically infer that the SSB+-MEHM 
can best capture the interaction of Us' behaviors toward host species which 
give rise to heterogeneous mass event counts. 

Biologically, the results reported in Table 4 support the current under- 
standing of behavioral traits. The majority of individual S. feltiae Us when 
encountering either species of host are likely adopting a "wait and see" pol- 
icy. Many risk-averse individuals collectively wait until very few extremists, 
risk-prone individuals, invade. These features are somehow reflected through 

Table 5 
Estimation comparison 



Host G. mellonella T. molitor 



Infective Juvenile 


S. carpocapsae 


S. feltiae 


S. carpocapsae 


S. feltiae 


LRM 


Q 


-0.4652 


-2.0360 


-1.3312 


-2.5218 




P 


0.0870 


0.1694 


0.0341 


0.0839 


LRM+ 


a 


-0.5353 


-4.9465 


-3.9497 


-3.5473 







0.1019 


0.6124 


0.7279 


0.2054 




f) 


0.9734 


0.8477 


0.4494 


0.6901 


LRM RE 


/'i 


-0.4764 


-3.5563 


-2.9760 


-3.6238 




fts 


0.0911 


0.2438 


0.0649 


0.1092 




(fi 


0.7385 


0.9970 


0.9997 


0.9992 




d-2 


0.0451 


0.0648 


0.0738 


0.0123 




P 


-0.8570 


0.6176 


0.9966 


0.8337 


SSB-MEHM 


a 


-1.3973 


-3.7067 


-3.0608 


-3.2272 




P 


0.3223 


0.5538 


0.8603 


0.3821 




A 


243.3073 


212.2950 


54.9325 


80.1841 




7 


0.9362 


1.0196 


2.0094 


1.8068 


SSB+-MEHM 


& 


-1.4321 


-3.8171 


-2.8765 


-3.0832 







0.3342 


0.5716 


1.1390 


0.5141 




V 


0.9998 


0.9659 


0.7251 


0.7700 




A 


98.0942 


95.1340 


60.1455 


72.6600 




7 


1.0011 


0.7598 


1.9859 


1.8790 
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having 7 estimates being significantly different from 1 in Table 5, in which 
all parameter estimations on all five models are reported. In sharp contrast, 
the individual infection decisions of S. carpocapsae IJs when encountering 
its favored host, G. mellonella, are likely independent from each other. The 
capability of making a behavioral adjustment for adapting to host differ- 
ences may not be new, but could be new as a computational outcome: when 
encountering a less favorable host, such as T. molitor, IJs of S. carpocapsae 
adopt the "wait and see" policy similar to S. feltiae IJs, and have a very 
different policy when they encounter a favorable host, such as G. mellonella. 
These results indeed lead to interesting hypothesis for the biology and dis- 
tribution of EPNs: when nematodes are associated with more susceptible 
hosts, then distribution should be less aggregated. 

7. Discussion. We developed the SSB mass event-history model for mod- 
eling potentially self-organized decision-making data obtained from a system 
constituted of many biological organisms or agents. Such self-organized be- 
haviors in general create macroscopically correlated patterns that underly a 
large number of event times within the same system, and render tremendous 
heterogeneity between replicated systems. This type of manifestation is be- 
yond what general individual-difference based random effect models could 
accommodate. Our mass event-history models are built with simple internal 
state-space dynamics for the "wait-and-see" behavioral tactics: the imper- 
missible state represents the behavior of waiting by the extremist or leader to 
take the first action; the permissible state models the cascade of many follow- 
ers' decision-making. With this dynamic structure, our mass event-history 
models shed light on biological and behavioral patterns of decision-making 
pertaining to many agents sharing a common environment. 

From the perspective of statistical merit, our SSB mass event-history 
models provide a simple and instrumental methodology for accommodating 
heterogeneity observed among independently and identically designed sys- 
tems. This capability is distinct from the random effect model per se. A 
random effect model in general maintains a static device for handling het- 
erogeneity stemming from independent, but possibly different individuals 
within a system. As we demonstrated through simulated as well as real data 
analysis, the random effect model works well when all involved systems of 
many agents are rather homogeneous. In contrast, when a system of many 
agents has the potential to build up system-wise self-organized behaviors, 
it would be worth modeling the system dynamics by properly capturing 
the underlying mechanism. Therefore, our mass event-history model is not 
only an alternative to the random effect model, but an important modeling 
technique on its own. 

From the perspective of dynamic differences, we lay out three temporally 
oriented aspects in Section 5. Through these aspects, we point out the fun- 
damental differences between the two dynamics governed by the SSB mass 
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event-history model and Logistic regression model with random effect. Al- 
though the resultant dynamic differences are informative, we believe that 
other important and essential perspectives could have been missed in our 
discussion. Given that this topic of comparing two dynamic systems is not 
yet well established, research is needed in this direction in statistics. 

In the nematode IJ invasion example discussed here, the unobserved state- 
space variable could be thought of as a physical existence as well as being 
created in an unsupervised fashion. It may exist as a physiological state that 
keeps the majority of Us from making their invading decisions. Only after 
a few risk-prone extremists or leaders have invaded the host will the rest 
of the risk-averse majority follow. This type of behavior is indeed seen in 
finance and many other social sciences. How to properly model the macro- 
scopic correlation resulting from information cascading to the majority is an 
important issue. Certainly, our internal state-space with impermissible and 
permissible states would not be sophisticated enough to cope with complex- 
ity generated from more intricate decision making systems. Even for Us, 
we expect that our simple state-space model structure would become too 
simple to be suitable when much more informative event-history data than 
the current-status data are collected. Such a possibility is likely to be seen if 
advances in experimental and data collection technologies are made possible 
in the near future. 

For current status data collected by sacrificing each experimental sys- 
tem at a time point, the dimensions of model extensions of the SSB mass 
event-history model could be rather limited. The limitations stem from the 
compositional and missing data structures involved. The presence of integra- 
tion in the likelihood function, or marginal probability Pr[N(t) = k\6], from 
time up to several sacrificing time points £j, i = 1, . . . , I, imposes a limit on 
the number of parameters that are identifiable and estimable. Thus, we need 
to employ parametric distributions in this setting. Further, as one condition 
of the SSB mass event-history model, the two compositional distributions 
involved must not belong to the same family. 

However, when the above possibility becomes reality and we could con- 
struct a setting where complete individual event-history data are available, 
the identifiability issue can be alleviated, even while the internal state vari- 
able information is still missing. Thus, a modeling extension with one semi- 
parametric model for time to action in the permissible state and one para- 
metric model for durations under the impermissible state become feasible. 
Furthermore, the 0-1 internal state-space variable used in the dynamic sys- 
tems here certainly can be expanded to properly accommodate further com- 
plexity of data structure. 
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SUPPLEMENTARY MATERIAL 

Score and information (DOI: 10.1214/08-AOAS189SUPP; .pdf). Here we 
give the gradient and second derivative of the log-likelihood for constructing 
the score and information, which can be used in numerical estimation of the 
parameter 6 = (a, (3, A, 7)' in the SSB mass event-history model. 
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